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Abstract 

The functional generalized additive model (FGAM) provides a more flexible non- 
linear functional regression model than the well-studied functional linear regression 
model. This paper restricts attention to the FGAM with identity link and additive er- 
rors, which we will call the additive functional model, a generalization of the functional 
linear model. This paper studies the minimax rate of convergence of predictions from 
the additive functional model in the framework of reproducing kernel Hilbert space. It 
is shown that the optimal rate is determined by the decay rate of the eigenvalues of a 
specific kernel function, which in turn is determined by the reproducing kernel and the 
joint distribution of any two points in the random predictor function. For the special 
case of the functional linear model, this kernel function is jointly determined by the 
covariance function of the predictor function and the reproducing kernel. The easily 
implementable roughness-regularized predictor is shown to achieve the optimal rate of 
convergence. Numerical studies are carried out to illustrate the merits of the predictor. 
Our simulations and real data examples demonstrate a competitive performance against 
the existing approach. 

Keywords: Functional regression, minimax rate of convergence, principal component 
analysis, reproducing kernel Hilbert space. 



1 Introduction 



Functional regression, in particular functional linear regression, has been studied extensively. 
Recent synopses include [13 ED], [S], and [IS]. Let X{-) be a random process defined on 
[0, 1] and Y be the univariate response variable. Typically, t is restricted to a compact 
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interval, so the assumption that t € [0, 1] causes no loss of generahty. Suppose we observe n 
i.i.d. copies of (Y, , (YJ, Xj) , i = 1, . . . ,n. The functional linear regression model assumes 



that ^ 

Yi = ao+ [ /3o{t)Xi{t)dt + a, (1) 
Jo 

where qq € M is the coefficient constant, Pq : [0, 1] — t- M is the slope function, and the ej are 
i.i.d. random errors with Ee^ = and Eef = cr'^, < <t^ < oo. One of the popular methods 
for estimating functional linear models is based on functional principal component analysis 
(see, e.g., [11], [20], [23], [1], [12], [9]). In addition, methods of regularization have also been 
applied to the functional linear model (see, e.g., [5], [25], [3]). 

Due to the limitation of the inherent linearity of ([TJ, [8] extended this model to non- 
parametric functional models and [TTj discussed functional models that are additive in the 
functional principal component scores of the predictor functions. Recently, [13] proposed a 
new model called a functional generalized additive model (FGAM). The same model was 
studied by [16] who called it the continuously additive model. We will study the special 
case of the FGAM with the identify link and continuous errors so that 



Yi = Fo(t,X,{t)yt + ei, 



(2) 



where Fq(-, •) : [0, 1]^ — t- M is a bivariate function. Because Fq is nonlinear, X{t) can be 
replaced by G{X{t)} for a transformation G. Since G can be strictly increasing function 
from the entire real line to [0, 1], assuming that X{t) G [0, 1] also causes no loss of generality. 
(In [13], Gt is allowed to depend on t and is an estimate of the CDF of X{t), but we will not 
pursue this refinement here.) Model ^ will be called the additive functional model and 
contains ([T|) as a special case with FQ{t,x) = oq + x/3o(t). The additive functional model 
offers increased flexibility compared to ([T|), while still facilitating interpretation and estima- 
tion. In [13], computational issues of this model were studied and Fq was estimated using 
tensor-product B-splines with roughness penalties. In |16j . a piecewise constant function 
was fit to Fq and the asymptotic properties, e.g., consistency and asymptotic normality, of 
predictions based on Fq were studied. 

In this paper, we study the minimax prediction. The unknown bivariate function Fq is 
assumed to reside in a RKHS n{K) with a reproducing kernel K : [0, 1]^ x [0, 1]^ M. The 
goal of prediction is to recover the functional ijq: 



m{x) = j^' FQ{t,x{t))dt, 



based on the training sample {Yi,Xi), i = 1, . . . ,n. Let Fn be an estimate of Fq from the 
training data. Then its accuracy can be naturally measured by the excess risk: 

2 



dKn :=E* 



Yn+l - Fn{t,Xn+l{t))dt^ - E* [y„+i - Fq , (i)) 
E*{ [F„(t, Xn+l{t)) - Fo(t, Xn+l{t))\ dt^, 
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where (y„+i, possesses the same distribution with {Yi,Xi) and is independent with 
(Yi,Xi), i = 1, . . . ,n, and E* represents taking expectation over only. It is 

interesting to study the rate of convergence of as the sample size n increases, which 
reflects the difficulty of the prediction problem. A closed related but different problem is 
estimation the bivariate function Fq. 

The optimal rate of convergence for the prediction problem is established in this paper. 
The spectral theorem admits that there exist a set of orthonormalized eigenfunctions {^^ : 
A; > 1} and a sequence of eigenvalues ki > K2 > • • • > such that 

K(^(t,x)]{s,y)^ =^Kktpk{t,x)^k{s,y): K{ipk)-= I / K(^]{s,y)^'4)k{s,y)dsdy = nki^k- 
k=i '' •' 

It is shown that under model ([2]), the difficulty of the prediction problem as measured by 
the minimax rate of convergence depends on the decay rate of the eigenvalues of the kernel 
C : [0, 1]2 X [0, 1]2 R, and 

C(^{t,x)-{s,y)) ■■= J j^[K^'\{t,x);{u,X{u))) K^'^{{s,y);{v,X{v)))]dudv (3) 

where K^/^ ^(t, x); (s, y)^ = YlV=i'^]!'^'^k{t^^)'^k{s,y)- A minimax lower bound is first 
derived for the prediction problem. Then a roughness-regularized predictor is introduced 
and is shown to attain the rate of convergence given in the lower bound. Therefore, this 
estimator is rate-optimal. 

The paper is organized as follows. Section 2 establishes the minimax lower bound for 
the rate of convergence of the excess risk. Section 3 develops a predictor using a roughness 
regularization method and shows this predictor is rate-optimal. Section 4 conducts a Monte 
Carlo study to validate the method and we also illustrate the merit of the method by using 
two real data examples. Some discussions are provided in Section 5. The paper ends with 
proofs in Section 6. 



2 Minimax Lower Bound 

In this section, we establish the minimax lower bound for the rate of convergence of the 
excess risk. 

Assume that the unknown Fq resides in a reproducing kernel Hilbert space 'H{K) with 
a reproducing kernel K. It is well-known that 7i{K) is a linear functional space endowed 
with an inner product (•, ■)u{K) such that 

F{t,x) = for any F e 7^ (if). 
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There is a one-to-one relationship between K and 1-L{K). It follows from ([3]) that 
C((t,x);(s,y)) 

K^/\{t, x); (n, zi)) ^1/2 {{s, y); {v, Z2))g{iu, z,) ; (f , Z2)) >dudvdzidz2, 

where g(^{u, zi); {v, Z2)) is the joint density function of {X{u),X{v)) evaluated at {zi,Z2)- 
Similarly, C admits the spectral decomposition, 

00 

C{{t,xy, {s,y)) = ^pj4>j{t,x)4>j{s,y), 
i=i 

where the pj are the positive eigenvalues with a decreasing order and the (pj are the corre- 
sponding orthonormal eigenfunctions. We assume pk k~'^^ for some constant < r < 00, 
where for two sequences ak,bk > 0, x bk means that ak/b^ is bounded away from zero 
and infinity as /c — )• 00. 

Theorem 2.1. Suppose that the eigenvalues {p^ '■ k > 1} of the kernel C in ^ satisfy 
pi^ X /c~2'" for some constant < r < 00, then the excess prediction risk satisfies 

lim lim inf sup pf^Hn > cn''^^^ =1, (4) 

where the infimum is taken over all possible predictors fj based on {{Yi,Xi) : i = 1, . . . , n}. 

It is interesting to compare Theorem 1 2 . 1 1 wit h some of the known results when functional 
linear regression is the true model. If the bivariate function F is restricted to the specific 
form F{t,x) = /3{t)x, where /3 belongs to a reproducing kernel Hilbert space T-L{K) with 
the reproducing kernel K : [0, 1] x [0, 1] — )• M, then we have a functional linear regression 
model. Assume K(t,s) = Yl'kLi'^k^k{'t)^k{s), where the {<ik-,^k) are the eigenvalue and 

eigenfunction pairs for K. It is not hard to see that K(^{t,x);{s,y)^ = 3K{t,s)xy = 
YlkLi i^k'ipk{t,x)iik{s,y), where Kfc = ft, i^k{t,x) = VSxiptit). Therefore, 

C(^{t,x);{s,y)^ =3xy j j K^^'^{t,u)G{u,v)K^/^{v,s)dudv, 

where G{u,v) = cov{X{u), X{v)) is the covariance function of X, so the eigenvalues of C 
have the same decay rate as the eigenvalues of K^^'^GK^^^ . This special setting coincides 
with those considered in [25] and [3]. Results similar to ours have been established in these 
papers for this special setting. 

3 A Roughness Regularized Estimate 

In this section, we will develop a predictor using a roughness regularization method and 
establish that this predictor achieves the optimal rate established in Theorem 12.11 
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3.1 Computation 

We define the estimate F^x of Fq as the minimizer of the functional 



F{t,Xi{t))dtY + XJ{F), (5) 

i=i ' -^0 

where A is the tuning parameter and J(-) is a squared semi- norm on T-L{K). The first term 
measures the closeness of the fit to the data, the second term controls the smoothness of 
the estimate, and the tuning parameter A adjusts the trade-off between these two. The 
estimate Fnx can be computed explicitly over the infinitely dimensional function space 
%{K). This observation is important to both numerical implementation of the procedure 
and our asymptotic analysis. 

Let Hq be the null space of J, i.e., T-Lq = {F G % : J{F) = 0}. Assume that {^i, . . . ,S,n} 
be the orthonormal basis of Hq with N = dim(?^o) < oo. Let Hi be its orthogonal 
complement in Ti such that Ti = Hq © Hi. 

Theorem 3.1. The minimizer of ^ over T-L{K) can be represented by 

N n „i 

Fnx{t,x) = J2djCj{t,x) + ^cJ J^((t,x);(s,Xi(s)))ds, (6) 
j=i i=i •'^ 

for some c = (ci, . . . , c.„)-'" G M" and d = {di, . . . , d^)'^ G M^. 

Denote by S the n x n matrix with = f f K (^{t, Xj{t))] {s,Xi(s))^dtds, and by 

H the nx N matrix with = J S^j{t, Xi{t))dt. Then, ([5]) may be written as the matrix 
form 

-||y-Hd-Sc||i + Ac^Sc, (7) 
n 

where J{F) = c^TiC. It is easy to see that the solution of the linear system 

(S + nXI)c + Ed =Y, (8) 
H^Sc + E^Ed =E^Y, (9) 

is a solution of ([7]). It follows from ([8]) and ([9]) that E'^c = 0. Suppose E is of full column 
rank. Let 

E = QR* = {Q^,Q^)(^ ^^=Q^R 

be the QR-decomposition of E with Q orthogonal and R upper-triangular. From E'^c = 0, 
QJc = 0, so c _L row((5i), the row space of Qi. Since Q is orthogonal, c G row((52)) and 
c = Q2QTiC because Q2Q2 projects onto row(Q2)- Simple algebra gives 

c = Q2(Q^SQ2 + nXiy^QlY, 

d = R-^iQjy -q^j:c). 
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3.2 Rate of convergence 

In this section, we turn to the asymptotic properties of the estimate -F^A- 
Theorem 3.2. Assume that for any F G L2{[0, 1]^) 

J F{t,x{t))dty <c(K(^j F{t,x{t))dtyy (w) 

for a positive constant c. Then, 

hm lim sup p(lH„ > An"2^ ) = 0, (11) 

when A is of order n"^''/^^^"^^) . 

We have made an additional assumption (jlOp on X. For the functional hnear re- 
gression model when F{t,x) = /3{t)x, condition (fTUj) shows that, for any /3 G L2([0, 1]), 

E(/ l3{t)X{t)dt)^ < c(^E(/ (5{t)X{t)dtYY , which states that linear functionals of X have 

bounded kurtosis. In general, (jlOp states that such special nonlinear functional 
of X have bounded kurtosis. 

It follows from both Theorem 12.11 and Theorem 13 . 21 that the minimax rate of convergence 
for the excess prediction is of order n~'^^/'^'^^'^^\ which is determined by the decay rate 
of the eigenvalues of the kernel C . 

3.3 Optimal choice of A 

Let Y = (^p^{Xi), . . . , rjp^{Xn)^ ■ Since the regularized estimator is a linear estimator in 
Y ,Y = H{X)Y, where H{X) is called the hat matrix depending on A. Some algebra yields 

H{X) = I- nXF2{F^T,F2 + nXiy^F^. 

We may select the tuning parameter A that minimizes the generalized cross-validation score 

GCV(A) = \\y-nl/n ^^2) 
[l-t,{H{X))/n] 

Choosing A by minimizing GCV worked very well in our numerical studies. 

4 Numerical Results 

In our numerical studies, we compare the numerical performance of the proposed predictor 
with some well-known existing predictors. 
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We will focus on a RKHS T-L{K) with a squared seminorm 



01^+0.2— m 

The function Jm(^{t — rr)^ + (rr — y)^^, where Jm{x) = a:;^"*~^logx acts like a reproducing 
kernel in this approach to the computation of thin-plate splines, and hence is called a semi- 
kernel ([7], [H])- In this setting, the optimal solution of the roughness-regularized estimate 
can be written as 




where ^j{t,x) = t'^'^x'^^ for some pair of integers 71,72 with < 71 + 72 < m and is the 
number of such pairs. Let c and d be the estimates from the training data. Then, for any 
random function X, the predicted response is 




In particular, when m = 2, we have = 3, and 

il{t,x) = l, i2{t,x)=t, ^■^{t,x)=X, Jm{x) = X^ log X. 

Note that / ^i{t, X{t))dt = 1 and / S,2it, X{t))dt = 1/2. To avoid an identifiability problem, 
we may estimate di by di = Y17=i -^^ following, we will use thin-plate splines 
with m = 2 to fit the data. 

4.1 Simulations 

Our first simulation study compares our estimate with other two different estimates. The 
first method uses the well-known functional principal component analysis (FPCA) approach. 
The second method uses the P-spline approach in [TB] , where one estimates F using tensor- 
product B-splines with roughness penalties. The simulation setting is the same as the 
setting of [9] and [13]. The random predictor function X was generated as 

50 

X{t) = CiZi + V2 QkZk cos(A:7rt), t £ [0, 1], 

k=2 

where are independently sampled from the uniform distribution on [— -y/S, \/3]. Ob- 
viously, the C| are eigenvalues of the covariance function of X. Consider two cases for 
the Ck- the "closely spaced" case and the "well spaced" case. For the well spaced case, 
(^fc = (— 1)'^+-^A:~'^/^ with u = 1.1 and 2. For the closely spaced case, Ci = 1) Cj = 
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Table 1: The root mean squared prediction errors (RMSPE) of three estimators for a 
functional linear regression model where Y = (3o{t)X{t)dt + e. FPCA is an estimation 
for the functional linear model based on functional principal components analysis. P-spline 
is the estimator of [13]. "ThinSpline" is our proposed estimator using a thin-plate spline. 



^ ^ 


FPCA P-Spline ThinSpline 


5 

2 

Well Spaced 

1.0 

2.0 


0.61 0.82 0.68 
0.52 0.55 0.56 
1.21 1.65 1.20 
1.04 1.09 1.08 


5 

2.0 

Closed Spaced 

1.0 

2.0 


0.52 0.53 0.52 
0.54 0.55 0.56 
1.03 1.07 1.03 
1.06 1.05 1.04 



0.2(-iy+i(l - O.OOOlj) for j = 2,3,4, and Csj+fc = 0.2{-lf^+''+'^{5jy/'^ - O.OOOIA: for 
J > 1 and < k < 4. The true coefficient function /3o was given by 

50 

/3o(t) = 0.3 + ^4\/2(-l)^'+^A;"2cos(A;7ri), te [0,1]. 

k=2 

The simulation study was performed when the functional linear regression model is the true 
model. The response variable Y is simulated from the model: Y = l3o{t)X{t)dt + e, 
where the error e ~ N{0, cj^), where a = 0.5 and 1. The performance of different estimators 

is measured by the root mean squared prediction error, RMSPE = \J'd~^~^^i=}~(^^'^-Yi^ , 

where d is the sample size of the test data and the Yi are predicted values. Each training set 
contains 67 curves and 33 curves are used for the test set. For each setting, the experiment is 
repeated 1000 times. The results of simulations are summarized in Table[TJ We observe that 
our thin-plate spline estimator performs nearly identically to the functional PCA estimator, 
even though this is an ideal setting for the latter since the functional linear model holds. 
Also, our estimator slightly outperforms the P-spline estimator. 

Next, we perform a simulation study to compare our estimate with the piecewise con- 
stant fit proposed in [16] when the additive functional model holds. The simulation setting 
is the same as that in |16j . The predictor functions are generated according to 

112 2 

X(t) = cos(J7i) sin(-7rt) + sin(f/i) cos(-7rt) + cos(i72) sin(-7rt) + sin(f/2) cos(-7rt) 
5 5 5 5 

for t € [0, 10] where Ui and U2 are iid from Uniform[0, 27r]. The sample size for the training 
data is n = 200 and for the testing data is d = 1000. The data are generated from 
two different nonlinear functional models: (i) Y = J^^ cos{t — X{t) — 5}dt + e; (ii) Y = 
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Table 2: The root mean squared prediction errors (RMSPE) based on three different esti- 
mators for two nonhnear functional regression models. PCF is the piecewise constant fit of 



Model 


a 


FPCA 


PCF 


SSpline 


Y = /o ° cos{t - X{t) - 5}dt + e 


2 
1 

0.5 


2.434 (0.018) 
1.723 (0.013) 
1.494 (0.011) 


2.200 (0.056) 
1.156 (0.037) 
0.680 (0.035) 


2.108 (0.062) 
1.127 (0.035) 
0.569 (0.026) 


Y = ffl''te^p{X{t)}dt + e 


1 


9.828 (0.106) 


1.119 (0.029) 


1.108 (0.031) 



Jq texp{X{t)}dt + e, where e ~ A^(0, cj^). For each setting, the experiment is repeated 
50 times. The means and the corresponding standard deviation of the root mean squared 
prediction error are given in Table [2j As expected, the functional PCA approach fails for 
these two examples as it has large prediction errors. In addition, our thin-plate spline 
estimate outperforms the piecewise constant fit (PCF) proposed in [16]. An additional 
tuning data set with sample size 200 is used to select the needed regularization parameter 
in the original simulation of PCF by [16]. A benefit of our approach is that we do not 
require this tuning data set in our simulations. 

4.2 Application: Canadian Weather Data 

The Canadian weather data example is revisited here. The dataset contains daily tempera- 
ture and precipitation at 35 different locations in Canada averaged over years 1960 to 1994. 
Our goal is to predict the log annual precipitation based on the average daily temperature. 
In [3] it was shown that the functional PCA approach could be problematic, since the eigen- 
functions corresponding to the leading eigenvalues of the covariance function seem not to 
represent the estimated coefficient function well. Therefore, we compare our method with 
the smoothing spline estimate when assuming the functional linear regression model. Under 
this setting, the estimate is given by 

(a,^) = argmin{- J];(l--a- / Xi{t)f3{t)dt^ +\ {p"{t)fdty (14) 

Figure [T] shows the estimated F^x when using the complete data. In order to study 
the performance of these estimators, we randomly split the initial sample into two sub- 
samples: (a) A learning sample, (Xj, Yj), i = 1, . . . , with = 20, was used to determine 
the estimated coefficient function j3\ and the estimator F^x] (b) A test sample, (Xj,l^), 
i = n£ + 1, . . . ,n, with n — = 15 was used to evaluate the quality of the estimation. The 
left panel of Figure [2] displays the estimated Fn\ from the training data set and the right 
panel of Figure [2] shows the predicted response versus the observed response for the testing 
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Figure 1: Estimated surface Fnx{t,x) from the Canadian weather data. 

data using the estimate from the training data. The points are very close to the diagonal 
line which indicates a good fit. We have repeated this procedure 200 times. The mean and 
the corresponding standard deviations of the root mean squared prediction errors based on 
()14|) and our proposed predictor are reported in Table [3l 

It is noteworthy that the prediction error using the continuously functional additive 
model is considerably less than for the functional linear regression model. The goodness- 
of-fit of different models is an important research topic and we will pursue this for future 
studies. 

4.3 Application: CA Air Quality Data 

Air pollutants are known to cause serious health problems. Modeling different ground level 
air pollutants has been an important research topics for many years. In May 2011, the Cali- 
fornia Air Resources Board has released the "2011 Air Quality Data" , which include 30 years 
of air quality data (1980-2009). This database, available at http : //www. arb . ca.gov/aqd/aqdcd/aqdcddld.ht 
contains hourly concentrations of pollutants at different locations in California from year 
1980 to year 2009. In this study, we will focus on the effect of the trajectories of ozone (03) 
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Figure 2: Left: Estimated surface F\[t,x) from the training data; Right: the predicted 
response versus the observed response for the testing data. 

Table 3: The root mean squared prediction errors based on the estimate (|14p and the 
proposed predictor for Canadian weather data. 





FLR ThinSpline 


RMSPE 


0.3014(0.1244) 0.1110(0.0917) 



on the maximum level of oxides of nitrogen (NOx) in the city of Sacramento (site 3011 in 
the database) between June 1 and August 31 of 2005. The total sample size is n = 92. The 
left panel of Figure [3] displays the daily trajectories of ground-level concentrations of ozone 
in the city of Sacramento in the Summer of 2005. For most days, we have the observations 
at each hour and there are a few days with some missing observations. The right panel of 
Figure [3] gives the maximum level of the ground-level concentrations of oxides of nitrogen 
at each day during the summer of 2005 in Sacramento. 

Figure H] shows the estimated F,n\ when using the complete data. It displays a highly 
nonlinear pattern, which may suggest that the functional linear model may not fit the data 
well. To assess the goodness of fit of the additive functional model, the left panel of Figure 
[5] plots the residuals on the vertical axis and the fitted responses on the horizontal axis. 
It shows the points are randomly dispersed around the horizontal axis and did not show 
any typical pattern. The right panel of Figure [5] plots the fitted values versus the observed 
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Figure 3: Left: Daily trajectories of ground-level concentrations of ozone in the city of 
Sacramento in the Summer of 2005; Right: The maximum level of the ground-level concen- 
trations of oxides of nitrogen at each day in the Summer of 2005. 

Table 4: The root mean squared prediction errors based on the functional linear regression 
(FLR) model and the additive functional model (ThinSpline) for the air quality data. 





FLR ThinSpline 


RMSPE 


0.9450 (L6539) 0.6148(0.0985) 



responses. The points are very closed to the diagonal line and it indicates a good fit. 

We also compare the performance of the additive functional model with the functional 
linear regression model ([1]). The 92 observations were randomly split into training sets 
of size 60 and test sets of size 32. We repeat this procedure 1000 times. The mean and 
the corresponding standard deviations of the root mean squared prediction error based on 
these two models are reported in Tabled! As expected, our additive functional linear model 
outperforms the functional linear model. 

5 Discussion 

We have established the minimax rate of convergence for prediction for the continuous 
functional additive model. It is shown that the optimal rate depends on the decay rate 
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Figure 4: Estimated surface Fnx{t,x) from the air quality data. 

of the eigenvalues of the kernel C, which depends on the reproducing kernel and the joint 
distribution of the random predictor function at any two points. The minimax theory in the 
existing literature on the functional linear regression model is a special setting of current 
study. 

We have focused on the additive functional model with the squared error loss in this 
paper. It should be noted that the method of regularization can be easily extended to 
handle other models such as the generalized regression model [H \T5\ \T3\ [6] . We shall leave 
these extensions for future papers. 

The simulation in this paper study only the estimator using thin-plate splines. For the 
case of univariate regression, [23] has showed that a smoothing spline and a P-spline are 
asymptotically equivalent. Similar asymptotic equivalent result is expected to hold for the 
bivariate regression too. So, it is expected that our simulation performance is similar to 
that of [E], who used the bivariate P-splines to fit the data. However, it should be pointed 
out that our results can be applied to the more general reproducing kernel Hilbert spaces. 

It is worth noting that estimating Fq itself is totally different problem with the prediction 
discussed in the current paper. For example, for the functional linear regression model. 
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Figure 5: Left: Residual plot; Right: Fitted values versus the observed responses. 



we may not estimate the coefficient function /3o consistently without additional conditions 
linking the smoothness of /5o and the curves [5]. As an example of additional assumptions, 
one might assume the reproducing kernel K and the covariance kernel G are perfectly 
aligned, i.e., they share the same set of eigenfunctions. Under this circumstance, we may 
estimate /3q consistently |25j . It deserves further study when we can estimate Fq consistently 
under the additive functional model. This issue is important and we could use this to test 
for linearity of Fq. 



6 Proofs 

6.1 Proof of Theorem 12.11 

In the following proofs, let q, i = 1,2, .. . be generic constants which change from line to 
line. 

Since any lower bound for a specific case yields immediately a lower bound for the general 
case, to establish lower bounds, we only study the case when the are i.i.d. N(0,a'^). Fix 
a S (0,1/8). It follows from Theorem 2.5 in j21j that in order to establish the minimax 
lower bound for 1H„, for each n we need to find functions {Fjn, j = 0, . . . , M}, satisfying 
the following three conditions: 

(a) . Fjn£n{K),j = 0,...,M, 

(b) . E*{ [Fjn{t,Xn+l{t)) - Fkn{t,Xn+l{t))]dty > 2s, 
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for < j < k < M, 

(c). J2j^Li^{Pjj Po) < alogM, where Pj denotes the jomt distribution of {(Yi,Xi) : 

i = 1, . . . n} when Fq = Fjn and /C(-, •) is the Kuhback-Leibler distance between two 
probabihty measures. 

We win specify M — )■ oo and s — )• later. If (a), (b), and (c) are satisfied, then the minimax 
lower bound for the rate of convergence of has the same order as s. 

First we verify part (a). Let m be the smallest integer greater than cqu^^^'^^'^^^ for some 
positive constant cq to be specific later. For a w = (wm+i, • • • , W2m) S {0, 1}™, let 

2m 
j=m+l 

G 'H{K) for all u) if K^/'^{(l)j) G UiK) for all j. Thus, we need to show that 
{K^/'^{4)j),K[-, {t,x))) = K^/'^{4)j)(t,x). This result holds since 

{K^/\<^,),K{.,{t,x))) = {K{<p,),K'/^.,{t,x))) = {4>„K'/^{;{t,x))) = K'/\cf>,){t,x). 
We also have 

where 5jk = 1 for j = k, and for j ^ k. 

Further, the Varshamov-Gilbert bound (see [21], p. 104) shows that, for m > 8, there 
exists a subset n = {uj^,uj^,. . . C {0, 1}™ such that oj^ = {0, . . . ,0}, 

d{J,uj^)>^, y 0<j <k<M, (15) 
8 

where d{u;^ , u'^) = E-=m+i I i^l ^ 

) is the Hamming distance between and w^, and 
M > 2"^/^ 

To verify part (b), for uj,uj' G fi, direct calculation yields that 



2m 2m 



dtds 



= E E rn^\i^j-^'j){i^k-u:'k) j j ^*[K''\<t^,){t,X{t))K'/\<Pk){s,X{s)) 

j=m+l fc=m+l 
2m 

= E m~^{^k-^'k?Pk> m~^p2md{uj,u')> cim-i(2m)-2-m/8 > csn-^'^/^^r+i) 

fc=m+l 

by (jl5|) . pk ^ k~'^^' , and the definition of m. Hence, s in part (b) is of order ri~'^^'^^'^^'~^^\ 
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Next, observe that for any lo^lo' G ^2, 

log(Pi.^,/Pi.J = ^fj(y,- y F^{t,X(t))dt) j [FUt,X{t))-F^,it,X{t))]dt- 

1 " f 

T^T.[j {FUt,X{t))-F^,it,X{t))]c 



i=l 



2a2 . 

1=1 

Therefore, 

2 



(it 



JC{Pf^,,PfJ = {i^a.(t,^(t)) - F^,{t,X{t))} 

2m 2m 
fc=m+l fc=m+l 



Since m is the smahest integer greater than con^/^^''+^), this imphes that 

M 

M 



1 

- j;/C(P„Po) < C3nV(2-+i) < alogM, 



if we choose cq > 803/(0 log 2) and M = 2*"/^. This completes the proof of Theorem 12. li 

6.2 Proofs of Theorem 13.11 and Theorem 13.21 

Proof of Theorem \3.1[ Define the subspace of V. , 

Til = span| J K(^{t,x);{s,Xi{s))^ds : i = 1, . . . , nj. 

Note that Hi is a closed linear subspace of Tii- For any F gT-L, one may write 

F = Fo + Fi + S, 
where Fq £ T-Lq, Fi G Hi and 6 € Hi Q Hi. Observe that 

riFiXi) = j Fit,Xi{t))dt = r]Fo+F,iXi), 

because 

Further, due to orthogonality, ||F||^ = ||-Fo + ^i|||^ + lli^lll^ and ||Fo + Fi|||^ < There- 
fore, the minimum of ([5]) must belong to the linear space Hq ® Hi. □ 
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Proof of Theorem\3M Note that L2{K^/'^) = n{K). So there exist Go and Gx such that 
Fo = K^/'^{Gq) and F^x = K^/'^{Gx). Therefore, 

VF,{X) = I Fo{t,X{t))dt = I ^K(^.;{s,X{s)),Fo)^^^ds 
= J {K'/^(^-,is,Xis))),Go)^ds, 



and 



where 



Write 



j (^K'/^(^-,is,Xis))),Gx-Go)^ds 



c 







G 





G(^(t,x); {ui,zi)^c(^{ui,zi); (^2, Z2)) G(^(u2, ^2); (s,?/)^ 



Cn[{t,x);is,y)) =-Y, I I K^'\{t,x);{u,Xi{u)))K^'^{{s,y);{v,X,{v)))dudv 
Recall that li = / {K^''^U{s,Xi{s))) ,GQ)ds + ei. Denote 5„ = ^ ELi ^ 



(s,X(s))jds. Then, Gx = [Cn + Xl) ( ^(Go) + <?« ) • Define Gx=[C + Xl] G{Gq). It 
follows from triangle inequality that 



Gx — Gc 



< 



c 



Gx — Gc 



c 



+ 



Gx — Gx 



c 



(16) 



Let us first bound the first term in the right hand side of ()16p . Recall that the (pk are the 
eigenfunctions of G. Write Gq = X^^Li Ofc'/'fc- Then, 



and 



Gx — Go 



„ 00 \ 2 2 



— 777 < A max ■ 



^ (A + Pfc)2 k>l (A + Pfc)2 ^ 



Gr 



L2 



Next, let us bound the second term in the right hand side of (|16p . Recall that ( G„ + 
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\IjG\ = CniGo) + gn- We observe that 

Gx-Gx = {C + XI)-\Cn + A/)(Ga - Gx) + (C + XI)-\C - C„)(Ga - Ga) 
= {C + \I)-\Cn + A/)Ga - (C + A/)-^G„Go - (G + A/)-i5n 

+ (G + A/)-nG-G„)(GA-GA) 
= (G + A/)-^G„(Ga - Go) + A(G + A/)-2cGo - (G + \I)-^gn 

+ {C + \ir\G-Cn){Gx-Gx) 
= (G + A/)-^G(Ga - Go) + A(G + A/)-2cGo - (G + \I)-^gn 

+ (G + A/)-1(G„-G)(Ga-Go) 

+ (G + A/)-nG - G„)(Ga - Ga) = I + n + III + IV + V. 



We now bound five terms on the right hand side separately. Direct calculation yields that 



\c 



(G + AI)-^G(Ga-Go 



< A^ max 



Pk 



fc>i (A + pk)^ ^ 



f24 = o{\) 



2 

L2 



G, 







Similarly, 



II 



c 



A(C + A/)-CG„||^ = A=g^<0(A) 



Gn 



Next, we make use three auxiliary results whose proofs are similar to ones in Cai and 
Yuan (2012) so we omit the details. If there exists a constant c > such that 



E 



F{t,X{t))dtj < c[E[^ / F{t,X{t))dtj 



\ 2\ 2 



for any v > such that 2r(l — 2i/) > 1, then 
G^(G + A/)-i(G-G„)G 



op 



-1/2 



and 



where 



C^/\C + A/)-^(G - G„)G-^ = Op f (?iAi/(2rO 



op 



-1/2 



lop stands for the usual operator norm. Further, for any < < 1/2 
CiG + Xir'gnW^ =Op((nAl-2-+l/(2r)^^^/'^ 



(17) 
(18) 

(19) 
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Using pT|) we have 

C-{C + \I)-\C-Cn){Gx-Gx) ' < 

< opil) CiGx - Gx 

whenever A > cn~^^/(^''+^) for some constant c > 0. Similarly, 



C''{G + \iy\G-Cn)G- 
2 



op 



G'iGx - G> 



L2 



C''{G + Xiy\G-Cn){Gx-Go) 



So, for < < 1/2 - l/(4r). 



< 



L2 



CiG + \I)-'{G -Cn)G~'' G''{Gx-Goj 

op 

< opil) CiGx - Go) ' . 

L2 



2 

L2 



C'{Gx - Gx 



< 



L2 



L2 



+ 



G''{C + XI)-\C-Gn){Gx-Go) 



G%C + \I)'^G{Gx-Gq) 
+ X\\G^+^G4l2 + l|G"(G + \ir^gn\\L2 + \C''{C + \ir\C - C„)(Ga - Gx] 



L2 



L2 



when cin-2'-/(i+20 < A < c2n-2^'/(^+2^) for < ci < C2 < 00. Next, 



IV 



c 



{C + Xiy\Cn-G){Gx-Go 

< Op((nAV(2r))-i/2;^.^ ^ o,((nAV(2r))-i/2y 
Similarly, 

llvIL 



G^/\G + XI)-\Gn - C){Gx - Go) 



L2 



< 



{C + \I)-\Gn-G){Gx-Gx) = C^'\G + \I)-\Gn-C){Gx-Gx) 

c 

C^/\G + XI)-\Cn - C)G-'^||||r'^(GA - Gx)\\l2 < Op((nAi/(2'^))-i/2A-) 
= o,((nAV(20)-i/2). 

It follows from p^ . 

(C7 + A/)"V = gV2(C + A/)"V =0j(nAV(2'-))-i/2y 

C Lo ^ ^ 



L2 



III 



Combining the facts above, we conclude that, if A is of order n 2r+i ^ then \\Gx — G 

2r 

Op{n 2r + l). 



AlIC 



□ 
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